Machine learning of probabilistic phonological pronunciation rules from the Italian CLIPS corpus

نویسندگان

  • Florian Schiel
  • Mary Stevens
  • Uwe D. Reichel
  • Francesco Cutugno
چکیده

A blending of phonological concepts and technical analysis is proposed to yield a better modeling and understanding of phonological processes. Based on the manual segmentation and labeling of the Italian CLIPS corpus we automatically derive a probabilistic set of phonological pronunciation rules: a new alignment technique is used to map the phonological form of spontaneous sentences onto the phonetic surface form. A machine-learning algorithm then calculates a set of phonological replacement rules together with their conditional probabilities. A critical analysis of the resulting probabilistic rule set is presented and discussed with regard to regional Italian accents. The rule set presented here is also applied in the newly published web-service WebMAUS that allows a user to segment and phonetically label Italian speech via a simple web-interface.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Transformation-Based Learning Method on Generating Korean Standard Pronunciation

In this paper, we propose a Transformation-Based Learning (TBL) method on generating the Korean standard pronunciation. Previous studies on the phonological processing have been focused on the phonological rule applications and the finite state automata (Johnson 1984; Kaplan and Kay 1994; Koskenniemi 1983; Bird 1995). In case of Korean computational phonology, some former researches have approa...

متن کامل

Independent automatic segmentation by self-learning categorial pronunciation rules

The goal of this paper is to present a new method to automatically generate pronunciation rules for automatic segmentation of speech the German MAUSER system. MAUSER is an algorithm which generates pronunciation rules independently of any domain dependent training data either by clustering and statistically weighting self-learned rules according to a small set of phonological rules clustered by...

متن کامل

An interactive English pronunciation dictionary for Korean learners

We present research towards developing a pronunciation dictionary that features sensitivity to learners’ native phonology, specifically designed for Korean learners of English-as-a-Foreign-Language (EFL). We envision a future system that can record and process learners’ imitation of the dictionary pronunciation and instantly provide segmental and prosodic feedback on accent. Towards this goal, ...

متن کامل

Automatic derivation of phonological rules for mispronunciation detection in a computer-assisted pronunciation training system

Computer-Assisted Pronunciation Training System (CAPT) has become an important learning aid in second language (L2) learning. Our approach to CAPT is based on the use of phonological rules to capture language transfer effects that may cause mispronunciations. This paper presents an approach for automatic derivation of phonological rules from L2 speech. The rules are used to generate an extended...

متن کامل

Building multiple pronunciation models for novel words using exploratory computational phonology

In this paper we describe a completely automatic algorithm that builds multiple pronunciation word models by expanding baseform pronunciations with a set of candidate phonological rules. We show how to train the probabilities of these phonological rules, and how to use these probabilities to assign pronunciation probabilities to words not seen in the training corpus. The algorithm we propose is...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013